Learning-from-Observation (LfO) is a robot teaching framework for programming operations through few-shots human demonstration. While most previous LfO systems run with visual demonstration, recent research on robot teaching has shown the effectiveness of verbal instruction in making recognition robust and teaching interactive. To the best of our knowledge, however, few solutions have been proposed for LfO that utilizes verbal instruction, namely multimodal LfO. This paper aims to propose a practical pipeline for multimodal LfO. For input, an user temporally stops hand movements to match the granularity of human instructions with the granularity of robot execution. The pipeline recognizes tasks based on step-by-step verbal instructions accompanied by demonstrations. In addition, the recognition is made robust through interactions with the user. We test the pipeline on a real robot and show that the user can successfully teach multiple operations from multimodal demonstrations. The results suggest the utility of the proposed pipeline for multimodal LfO.
translated by 谷歌翻译
Robot developers develop various types of robots for satisfying users' various demands. Users' demands are related to their backgrounds and robots suitable for users may vary. If a certain developer would offer a robot that is different from the usual to a user, the robot-specific software has to be changed. On the other hand, robot-software developers would like to reuse their developed software as much as possible to reduce their efforts. We propose the system design considering hardware-level reusability. For this purpose, we begin with the learning-from-observation framework. This framework represents a target task in robot-agnostic representation, and thus the represented task description can be shared with various robots. When executing the task, it is necessary to convert the robot-agnostic description into commands of a target robot. To increase the reusability, first, we implement the skill library, robot motion primitives, only considering a robot hand and we regarded that a robot was just a carrier to move the hand on the target trajectory. The skill library is reusable if we would like to the same robot hand. Second, we employ the generic IK solver to quickly swap a robot. We verify the hardware-level reusability by applying two task descriptions to two different robots, Nextage and Fetch.
translated by 谷歌翻译
将口头文本映射到手势是具有对话能力的机器人的重要研究主题。根据人类共同语音手势的研究,用于映射的合理解决方案是使用基于概念的方法,其中文本首先映射到包含具有类似含义的文本的语义集群(即,概念)。随后,每个概念映射到预定义的手势。通过使用基于概念的方法,本文讨论了获得针对会话代理的独特词汇概念的实际问题。使用Microsoft Rinna作为代理商,我们通过通过社会学方法通过自然语言处理(NLP)方法来定制通过自然语言处理(NLP)方法来进行自动获得的概念。然后,我们确定了NLP方法的三个限制:用Emojis和符号的语义级别;在语义层面与俚语,新词和流行语;并且在务实的水平。我们将这些限制归因于rinna的个性化词汇。后续实验表明,使用基于概念的方法选择的机器人手势留下比rinna词汇的随机选择的手势更好的印象,这表明了基于概念的手势生成系统进行个性化词汇表。本研究提供了对具有个性化词汇表的会话代理商的姿态生成系统的开展了解。
translated by 谷歌翻译